Web Programming
CIS 193 – Go Programming
Prakhar Bhandari, Adel Qalieh
CIS 193
Prakhar Bhandari, Adel Qalieh
CIS 193
Go code is organized into packages - we've been using packages throughout the semester!
All of the files in a package are in the same directory
package main import ( "fmt" "strings" "math/rand" ) func main() { fmt.Println(rand.Int()) }
To rename an import, simply place the desired name before. This is important when the imported names clash.
import ( "crypto/rand" mrand "math/rand" )
What happens if you import into _
?
So far, we've limited ourselves to packages included with the Go standard library.
We can use go
get
to install packages from the internet
The GOPATH
environment variable tells the Go tool where your workspace is located.
go get github.com/dsymonds/fixhub/cmd/fixhub
The go
get
command fetches source repositories from the internet and places them in your workspace
How do you choose what version of a package you want with go
get
?
Currently, you can't! Thus, there are several unofficial community-led projects to solve the Go versioning problem.
All of these work on a vendor
subdirectory and install packages there instead of in the global namespace, $GOPATH/src
.
go
install
a local package and caches it in the pkg
directory, similar to `go build`
go
list
lists the buildable Go packages in the current directory recursively
go
doc
shows documentation for the provided input, ex:
go doc fmt.Println
$GOPATH/ bin/fixhub # installed binary pkg/darwin_amd64/ # compiled archives github.com/... src/ # source repositories github.com/ golang/lint/... # used by package fixhub .git google/go-github/... # used by package fixhub .git dsymonds/fixhub/ .git client.go cmd/fixhub/fixhub.go # package main
Doc comments are before the declaration of an exported identifier:
// Join concatenates the elements of elem to create a single string. // The separator string sep is placed between elements in the resulting string. func Join(elem []string, sep string) string {
These are complete sentences beginning with the exact identifier. Everything public should be documented!
The godoc tool extracts such comments and presents them on the web:
HTTP (Hyper Text Transfer Protocol) is a client-server protocol. Remember that a server is an application that listens for incoming requests from clients, and returns and appropriate response.
When you access a page on the web, you (the client) make an HTTP request to the webserver hosting the page, and you get the HTML from the server as a response.
HTTP is a protocol to communicate on the web
Consists of verbs on resources:
GET Requests
resp, err := http.Get("https://httpbin.org/get") defer resp.Body.Close() body, err := ioutil.ReadAll(resp.Body)
POST Requests
http.Post
or http.PostForm
.Sending Data
url.Values
type
The status code of a response object resp
is given by resp.StatusCode
net/http
packageTo actually check for HTTP status code errors in Go:
if resp.StatusCode != http.StatusOK { // http.StatusOK == 200 }
APIs, or Application Programming Interfaces, specify how to interact with a piece of software
Lots of services on the web provide APIs that usually communicate data in JSON
Remember JSON?
{ "id": 1, "name": "A green door", "price": 12.50, "tags": ["home", "green"] }
Revisit the previous lecture for how to handle JSON in Go
HTML, or HyperText Markup Language, is a standardized format for the contents of a webpage
HTML documents are made of elements (tags) that have nested content and attributes
Most tags have an opening and closing tag
<a href="http://www.google.com">content</a>
HTML documents form a tree-like structure, with <html> as the root
Since so much data is on the web, and some of it may not be available via a convenient API, web scraping is a means for programmatically extracting data from the web
Web scraping can be done with several languages - what are some benefits of using Go?
There are several techniques and strategies for web scraping
To extract data from a page, you need to be familiar with the structure of the HTML document
<html> <h1>I am a heading!</h1> <div> <p> <a href="http://www.google.com">Google</a> </p> </div> <div> <a href="http://www.yahoo.com">Yahoo</a> </div> <a href="http://www.bing.com">Outside link</a> <p>Hi I am a paragraph and I am <strong>bold</strong></p> </html>
We'll be using the goQuery
package
go get github.com/PuerkitoBio/goquery
See the full documentation here
goQuery uses CSS selectors to manipulate HTML documents, inspired by jQuery, a popular Javascript library.
Some examples:
"p" -> Selects all <p> elements "p, a" -> Selects all <p> and <a> elements ".test-class" -> Selects all elements with class="test-class" "#test-id" -> Selects all elements with id="test-id" "p a" -> Selects all <a> elements inside <p> elements "p > a" -> Selects all <a> elements with parent <p>
A more complete guide is here
doc, err := goquery.NewDocument("http://metalsucks.net") // Error handling // Find the review items doc.Find(".sidebar-reviews article .content-block").Each(func(i int, s *goquery.Selection) { // For each item found, get the band and title band := s.Find("a").Text() title := s.Find("i").Text() fmt.Printf("Review %d: %s - %s\n", i, band, title) })
Equivalently, we can use range
sel := doc.Find(".sidebar-reviews article .content-block") for i := range sel.Nodes { band := sel.Eq(i).Find("a").Text() title := sel.Eq(i).Find("i").Text() fmt.Printf("Review %d: %s - %s\n", i, band, title) }